Part A : Random Decision Policy and optimal policy learnt via Q-learning
Part B: Learning via linear approximation of Q-function using hand crafted features
Author: Aditya Patel¶#!pip install gym
import numpy as np
import gym
import random, time
from IPython.display import clear_output
np.set_printoptions(suppress=True)
from IPython.display import Video
env = gym.make("Taxi-v3").env
action_space_size = env.action_space.n
state_space_size = env.observation_space.n
# ?env.env
GAME DESCRIPTION
Type: TaxiEnv
String form: <TaxiEnv<Taxi-v3>>
File: ~/gym/gym/envs/toy_text/taxi.py
Docstring:
The Taxi Problem
from "Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition"
by Tom Dietterich
Description: There are four designated locations in the grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the episode starts, the taxi starts off at a random square and the passenger is at a random location. The taxi drives to the passenger's location, picks up the passenger, drives to the passenger's destination (another one of the four specified locations), and then drops off the passenger. Once the passenger is dropped off, the episode ends.
Observations: There are 500 discrete states since there are 25 taxi positions, 5 possible locations of the passenger (including the case when the passenger is in the taxi), and 4 destination locations. Note that there are 400 states that can actually be reached during an episode. The missing states correspond to situations in which the passenger is at the same location as their destination, as this typically signals the end of an episode. Four additional states can be observed right after a successful episodes, when both the passenger and the taxi are at the destination. This gives a total of 404 reachable discrete states.
5 Passenger locations:
4 Destinations:
Actions of agent/taxi: There are 6 discrete deterministic actions:
Rewards: There is a default per-step reward of -1, except for delivering the passenger, which is +20, or executing "pickup" and "drop-off" actions illegally, which is -10.
Rendering:
state space is represented by a tuple (taxi_row, taxi_col, passenger_location, destination)
The filled square represents the taxi, which is yellow without a passenger and green with a passenger.
The pipe ("|") represents a wall which the taxi cannot cross.
R, G, Y, B are the possible pickup and destination locations. The blue letter represents the current passenger
pick-up location, and the purple letter is the current destination.
env.reset() # reset environment to a new, random state
env.render()
print("Action Space {}".format(env.action_space))
print("State Space {}".format(env.observation_space))
+---------+ |R: | : :G| | : | : : | | : : : : | | | : | : | |Y| : |B: | +---------+ Action Space Discrete(6) State Space Discrete(500)
state = env.encode(4, 4, 1, 0) # (taxi row, taxi column, passenger index, destination index)
print("State:", state)
env.s = state # taxi is asked to pickup from G ---> dropoff at R and taxi will be starting at cell (4,4) bottom right in grid
env.render()
State: 484 +---------+ |R: | : :G| | : | : : | | : : : : | | | : | : | |Y| : |B: | +---------+
"""
When the Taxi environment is created, there is an initial Reward table that's also created, called `P`.
We can think of it like a matrix that has the number of states as rows and number of actions as columns,
i.e. a states x actions matrix.
"""
"\xa0\n When the Taxi environment is created, there is an initial Reward table that's also created, called `P`. \n We can think of it like a matrix that has the number of states as rows and number of actions as columns, \n i.e. a states x actions matrix.\n"
Here dataset is in the form of a dictionary
env.P[17] # This dictionary has the structure {action: [(probability, nextstate, reward, done)]}.
# """This is the 17th experience the agent(taxi) had. The agent will see this situation and see reward and
# next state accordingly"""
{0: [(1.0, 117, -1, False)],
1: [(1.0, 17, -1, False)],
2: [(1.0, 37, -1, False)],
3: [(1.0, 17, -1, False)],
4: [(1.0, 17, -10, False)],
5: [(1.0, 1, -1, False)]}
To go from G to R, human would take 8-15 steps depending on starting position
As seen below, suppose the taxi is at bottom right of grid. It is asked to pick passenger from G and dropoff at R. The taxi will take random sequence of decisions to accomplish that. This can be changed to any arbitrary location
env.s = 484
env.render()
+---------+ |R: | : :G| | : | : : | | : : : : | | | : | : | |Y| : |B: | +---------+
env.s = 484 # initialize
unitsteps = 0
penalties, reward = 0, 0
flag = 'no'
frames = [] # for animation
done = False
while not done:
action = env.action_space.sample()
state, reward, done, info = env.step(action)
# Put each rendered frame into dict for animation
frames.append({
'frame': env.render(mode='ansi'),
'state': state,
'action': action,
'reward': reward
}
)
if state == env.encode(0, 4, 4 ,0):
print('passenger picked up at step # {}'.format(unitsteps))
if reward == 20:
print('dropped successfully')
print(unitsteps)
unitsteps += 1
print("Timesteps taken: {}".format(unitsteps))
#print("Penalties incurred: {}".format(penalties))
passenger picked up at step # 424 passenger picked up at step # 425 passenger picked up at step # 426 passenger picked up at step # 427 passenger picked up at step # 428 passenger picked up at step # 503 passenger picked up at step # 581 passenger picked up at step # 900 passenger picked up at step # 901 passenger picked up at step # 907 passenger picked up at step # 908 passenger picked up at step # 913 passenger picked up at step # 922 passenger picked up at step # 923 dropped successfully 2655 Timesteps taken: 2656
Took 2656 steps, above. Since its random, it will take hundreds and even thousands of steps of re-run
from IPython.display import clear_output
from time import sleep
def print_frames(frames):
for i, frame in enumerate(frames):
clear_output(wait=True)
#env.render()
print(frame['frame'])
print(f"Timestep: {i + 1}")
print(f"State: {frame['state']}")
print(f"Action: {frame['action']}")
#print(f"Reward: {frame['reward']}")
sleep(.05)
print_frames(frames)
+---------+
|R: | : :G|
| : | : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
(Dropoff)
Timestep: 2656
State: 0
Action: 5
Video('/Users/namita/Desktop/Python_programs/RL/random.mp4', width = 400, height = 400,embed=True)
As seen above, if taxi were to take random decisions, it will reach destination taking hundreds or thousands of steps
Part B: Self-driving taxi using Discrete Time Q-Learning algorithm for Reinforcement Learning¶Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. For any finite Markov decision process (FMDP), Q-learning finds an optimal policy/strategy aimed at maximizing the expected value of the total reward over any and all successive steps, starting from the current state.Q-learning can identify an optimal action-selection policy for any given FMDP
Think of environment as situation. The world. env.reset() gives the agent it's 1st situation. note. The agent accumulated 500 datapoints as prior experience. It has to take best policy to take decisions to reach goal.
Situation is not just about the taxi and where it is, its also about the where the passenger is, what is the
destination" So, agent is in that intiutaliozed situation...thinking what to do? then it takes action as per policy.
Based on action, its situation changes. Now it may be closer to passenger, but farther from destination or may be
it became farther from both passenger and destination. But its situation changed slightly. But the reason it changed is because it took action of say going north. If it stood there without taking any action, its situation will not
change. Agent cannot be lazy. If it doesn't take any action, its situation remains the same - good or bad!
The way to take action is via env.step(...action...) But before that, to start the game, the agent says, give me
my 1st situation - that's what env.reset() does. So the game begins with a situation. Then agent takes action, gets
reward and sees new situation which may be better or worse than the prior one. We don't do. But it needs to take
actions such that it will maximize the chances of reaching the ideal situation. In the ideal situation, it has dropped
off passenger at the destination. And that situation has a specific number. Suppose, destination is location 2,
then this ideal situation would be encoded as (0,0,0,0) This means taxi is in 0th row, 0th column, passenger is also
at 0 and destination is 0. So it needs to take sequential actions to reach there, although it may never have gone to
destionation 0 in life before.
Note that env.step (...action..) changes the environment or situation of the agent.
"""
Suppose u are standing at center, you can go North, South, east,west.
25
|
|
---$10---$10 - center ------$5----$50----$30
|
|
$20
|
|
If I am the agent who has to take a decision as to which direction to go, if I am given
the Q-value function for each of 4 actions, I can decide I should go east. Although
immediate reward is $5, discounted present value of all rewards is $80, total Q-value
ie. benefits is $85 by taking action of going East.
Problem is I have to find all these dollar Q-values. I don't know them. I have to use
Q-learning algorithm to find the optimal Q-values. For that I need data on
(state,action,reward,newstate) tuples. I need 1000s of such tuples to train my model.
To do Q-learning, i.e. to learn Q-value function for each state,
action pair we need to have experience. Experience is defined as (what state were u
in, what action u took, what reward u got by doing that and what new state did u end up in) i.e. (current state, current action, reward, new state).
So if we have a dataset of
(current state1, current action1, reward, new state).
(current state1, current action2, reward, new state).
(current state1, current action3, reward, new state).
(current state2, current action1, reward, new state).
(current state2, current action2, reward, new state).
(current state3, current action3, reward, new state).
(current state3, current action1, reward, new state).
(current state4, current action2, reward, new state).
etc...u need this dataset to start training i.e. find optimal Q-value function
u must define a target state too
- Reward (immediate) is not Q-value"""
"""
new qvalue for a given state,action pair is equal to old estimate of qvalue + learning rate
x ( reward + discount factor alpha x Q-value for best action in next/future state
you ended up in - Qvalue of current state,action pair
True/False
Each state has 1 Q-value - False
Each state has 1 Q-value for every 1 possible action.
"""
Slide Source: Prof. Lei Ying (University of Michigan)
i ------> current state/situation agent sees
u -------> current action agent takes
j ------> next state/situation agent sees
r -------> reward
beta ----> learning rate
Q_i_u ---> Q-value from the table of (state,action) pair (i,u)
What are we updating? Q-value!
Q-value tells how valuable is it for agent to take a certain action from certain state
i.e Q-value of (state,action). Its like a chess piece is at position a2 on chess board,
where should it go so as to maximize not just immediate reward but also discounted present
value of all future rewards upto time = infinity, it can get by making that move. The agent
could make pawn move to the
To write optimization equation, I need i, u, Q(i,u), list of all possible v's, r, beta
class Taxi:
def __init__(self):
self.epsilon = 0.1
self.beta = 0.05
self.alpha = 0.6
self.qtable = np.zeros((env.observation_space.n, env.action_space.n))
def train (self,number = 10001):
print('Taxi will undergo training imparted by Q-learning and Bellman Equation')
for episode in range(number):
i = env.reset()
done = False
step = 0 # to keep track of when it converged
while not done :
""" Getting ready for optimization"""
if random.uniform(0, 1) < self.epsilon:
u = env.action_space.sample() # Explore action space
else:
u = np.argmax(self.qtable[i]) # Exploit learned values
""" Now I have i,u, the 1st situation and agent's 1st action
But u has not been executed yet! Executing it below
agent must take action to get to situation j.
It does that in next line of code """
j, r, done, info = env.step(u)
""" So i have i,u,j,r, beta,qtable[i,u] is available since i've initialized qtable
Now need highest Q-value corresponding to best action when agent is in state j """
max_v = np.max(self.qtable[j])
""" Now, I have all terms needed to optimize. Optimization begis NOW """
self.qtable[i,u] = self.qtable[i,u] + self.beta * ( r + self.alpha * max_v - self.qtable[i,u])
""" One optimization iteration is over. Q_i_u was updated exactly once.
For 2nd optimization iteration, what i'm calling next situation j becomes
current situation. Note that for situation j, agent has not taken any action
yet at this point in the code block """
i = j
""" Just for tracking next 2 lines are optional """
if episode % 1000 == 0:
clear_output(wait=True)
print(f"Taxi is training wait ..: {episode}/{number}")
print("Taxi now trained using Reinforcement Learning.\n")
return self.qtable
def drive(self, row=4, col=4, pickup='G',dropoff='R'):
self.qtable = self.train(10001)
d = {'R':0,'G':1 ,'Y':2 ,'B':3}
pk, dp = d[pickup], d[dropoff] # convert R,G,Y,B to 0,1,2,3
coords = {'R':[0,0],'G':[0,4] ,'Y':[4,0],'B':[4,3]}
env.s = env.encode(row,col,pk,dp)
i = env.s
unitsteps=0
frames=[]
done = False
print(self.qtable)
while not done:
u = np.argmax(self.qtable[i])
action_rendering = {0: 'South',1:'North',2:'East',3:'West',4: 'In taxi', 5: 'Dropped off' }
j, r, done, info = env.step(u)
frames.append({
'frame': env.render(mode='ansi'),
'state': i,
'action': action_rendering[u],
'reward': r
}
)
if i == env.encode(coords[pickup][0], coords[pickup][1], d[pickup] , d[dropoff] ):
print('passenger picked up at step # {}'.format(unitsteps))
if r == 20:
print('dr"""
new qvalue for a given state,action pair is equal to old estimate of qvalue + learning rate
x ( reward + discount factor alpha x Q-value for best action in next/future state
you ended up in - Qvalue of current state,action pair
True/False
Each state has 1 Q-value - False
Each state has 1 Q-value for every 1 possible action.
"""opped successfully at step')
print(unitsteps)
break
unitsteps += 1
i = j
print("Timesteps taken: {}".format(unitsteps))
from IPython.display import clear_output
from time import sleep
def print_frames(frames):
for i, frame in enumerate(frames):
clear_output(wait=True)
#env.render()
print(frame['frame'])
print(f"Timestep: {i + 1}")
print(f"State: {frame['state']}")
print(f"Action: {frame['action']}")
print(f"Reward: {frame['reward']}")
if i == 0:
sleep(2)
else:
sleep(0.4)
print_frames(frames)
env.render()
+---------+
|R: | : :G|
| : | : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
(Dropoff)
Suppose by default, taxi is sitting somewhere like an Uber driver between 2 rides. It gets a message to go pickup passenger at 'G' and deliver to 'R. I can specify any pickup and drop off location and also make the taxi start from any random position of my choice to test if it works
taxi = Taxi()
taxi.drive(4,4,'Y','R')
+---------+
|R: | : :G|
| : | : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
(Dropoff)
Timestep: 14
State: 16
Action: Dropped off
Reward: 20
Video('/Users/namita/Documents/Qlearning.mp4', width = 800, height = 800,embed=True)
Part B : Hand Crafted featurizer and Q-learning using Linear Approximation¶To solve the same problem using linear approximation to reduce the size of qtable which otherwise would have been 500 x 6 = 3000 qvalues
Source: Prof. Lei Ying, University of Michigan
def phi(xk, uk):
"""
Inputs:
xk - Integer from 0 to 499 indicating 500 datapoints
uk - Integer from 0 to 5 indicating actions of agent/taxi
Output:
Feature vector for given (state,action) pair of experience
"""
# decode state from decimal to tuple
taxi_row,taxi_col,passenger,destination = env.decode(xk) # taxi row, taxi col, passenger, destinaton
feature_vec = np.zeros(18)
# decode state of passenger into row and column
dic = {0:[0,0], 1:[0,4], 2:[4,0],3:[4,4], 4:[taxi_row, taxi_col]}
passenger_location = dic[passenger]
destination_location = dic[destination]
taxi_location = [taxi_row, taxi_col]
dist_to_passenger = np.abs(np.array(taxi_location) - np.array(passenger_location)).sum()/5
dist_to_destination = np.abs(np.array(taxi_location) - np.array(destination_location)).sum()/5
position = uk * 1
feature_vec[position] = dist_to_passenger
feature_vec[position + 1] = dist_to_destination
if [taxi_row, taxi_col] == passenger_location:
feature_vec[position + 2] = 1
else:
feature_vec[position + 2] = 0
return feature_vec
phi(484,0)
array([0.8, 1.6, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ])
# Example: resulting feature vector for say starting position of 484 and action of 2)
# I converted (state, action) space of size 3000 i.e. 500 states, 6 actions into feature vector of size 12
alpha = 0.9 # Discount factor
beta= 0.2 # Learning rate.
beta_rate_decay = 0.995 # decay the learning rate as the training proceeds.
min_beta_rate = 0.001
epsilon = 0.7 # For the epsilon-greedy exploration.
epsilon_decay = 0.5 # decay the epsilon as the training proceeds.
#--------------------------------
theta = np.zeros(18)
def select_action(state):
qvalue_dict = {}
for action in [0,1,2,3,4,5]:
feature_vec = phi(state,action)
qvalue_dict[action] = np.dot(theta,feature_vec)
action_best = max(qvalue_dict, key = qvalue_dict.get)
return action_best
for episode in range(10001):
i = env.reset()
done = False
step = 0 # to keep track of when it converged
while not done :
""" Getting ready for optimization"""
difference = theta
if random.uniform(0, 1) < epsilon:
u = env.action_space.sample() # Explore action space
else:
u = select_action(i) # Exploit learned values
""" Now I have i,u, the 1st situation and agent's 1st action
But u has not been executed yet! Executing it below
agent must take action to get to situation j.
It does that in next line of code """
j, r, done, info = env.step(u)
""" So i have i,u,j,r, beta,qtable[i,u] is available since i've initialized qtable
Now need highest Q-value corresponding to best action when agent is in state j """
""" Now, I have all terms needed to optimize. Optimization begis NOW """
dot_product = []
for v in [0,1,2,3,4,5]:
dot_product.append(np.dot(theta,phi(j,v)))
max_v = max(dot_product)
delta_k = r + alpha*max_v - np.dot(theta,phi(i, u))
theta = theta + (beta * delta_k * phi(i,u))
if np.max(theta - difference) < 0.01:
break
i = j
if r_xk_uk == 20:
print('dropped successfully at step')
print(steps)
break
step+=1
if episode % 100 == 0:
clear_output(wait=True)
print(f"Taxi is training wait ..: {episode}")
print("Taxi now trained using Reinforcement Learning.\n")
Taxi is training wait ..: 10000 Taxi now trained using Reinforcement Learning.
theta
array([-4.86735192, -3.74559771, -4.16088422, -4.64604911, -6.91572212,
-9.64440762, -8.09732866, -7.09043179, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ])
"""To update theta I need:
old theta from theta array
beta
delta_k
phi(xk,uk)
Furthermore for delta_k, I need reward r_xk_uk, alpha, max approximated Q-value from Q-table
for x+1 i.e. next state, old theta vctor, phi(xk, uk)
"""
'To update theta I need:\n old theta from theta array\n beta\n delta_k\n phi(xk,uk)\n\n Furthermore for delta_k, I need reward r_xk_uk, alpha, max approximated Q-value from Q-table \n for x+1 i.e. next state, old theta vctor, phi(xk, uk)\n'
env.reset()
env.s = 484
unitsteps, penalties, reward = 0, 0, 0
state = env.s
done = False
frames=[]
for episode in range(10):
xk = env.s
for step in range(100):
decision = {}
for uk in [0,1,2,3,4,5]:
decision[uk] =np.dot(theta, phi(xk,uk))
action = max(decision, key = decision.get)
print(action)
xk_plus1, r_xk_uk, done, info = env.step(action)
print(xk_plus1)
if state == env.encode(0, 4, 4 ,0):
print('passenger picked up at step # {}'.format(unitsteps))
if reward == 20:
print('dropped successfully')
print(i)
break
xk = xk_plus1
2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484 2 484
Conclusion: Linear Approximation does not converge. I tried all sorts of features, none converged. A better and far "easier" way is to run this through a Deep Neural Network in Keras and let the Neural Net figure out the features on its own. Just tried this for intuition and failed!